Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Clustering and App related functions #4407

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

naiming-zededa
Copy link
Contributor

  • change VMI to VMI ReplicaSet for kubernetes
  • change Pod to Pod RelicaSet for containers
  • change functions handling replicaset names in services
  • subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster
  • add Designated Node ID to several structs for App
  • not to delete domain from kubernetes if not a Designated App node
  • parse config for EdgeNodeClusterConfig in zedagent
  • handle ENClusterAppStatus publication in zedmanger in multi-node clustering case
  • zedmanager handling effective-activation include ENClusterAppStatus
  • kubevirt hypervisor changes to handle VMI/Pod ReplicaSets

}
if err := hyper.Task(status).Cleanup(status.DomainName); err != nil {
log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err)
// in cluster mode, we can not delete the pod due to failing to get app info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the issue this is fixing something appearing when there is a failover/takeover and another node in the cluster starts running the app instance?
Or is it something which could happen when an app instance is first provisioned on the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen even on the first node of the app deployment, sometimes we can not get the status from the k3s cluster, or somethings it takes time to come up running state, but we should not remove this kubernetes configuration, it has the config stored in the database, it has it's own scheduling and control process to eventually bring it to the intended state. If we delete the config from the cluster, then we need to wait for another 10 minutes to retry, etc. and it will cause confusing.

Copy link
Contributor Author

@naiming-zededa naiming-zededa Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, a new boolean is introduced in the domainstatus, DomainConfigDeleted, allow the Designated node, if it knows for sure the app instance is removed from the device, then it can go ahead to delete the app/domain from the cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to capture the above explanation either in a comment here or in pkg/pillar/docs/zedkube.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I documented this in zedkube.md, and referenced from domainmgr.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - re-reading it and it still looks odd.
Does the kubevirt Info() return an error or does it return HALTED when the issue is merely that it can't (yet) fetch the status from k3s?
Can we not fix that Info() to return something more accurate?

If it returns an error or HALTED then this function will set an error, and that error might be propagated to the controller, which would be confusing if the task is slow at starting, or is starting on some other node.

So I think this check is in the wrong place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current code in kubevirt.go logic is that

  1. if not found this app, then it will return status "", and an error logError("getVMIStatus: No VMI %s found with the given nodeName %s", repVmiName, nodeName)
  2. if found this app on another node in cluster, then it returns status "nolocal", no error
  3. if found on this node, then return whatever the kubernetes app running status

with above condition, if error is returned, then status is set to types.Broken
we further map the above status in string with a mapping:

// Use few states  for now
var stateMap = map[string]types.SwState{
    "Paused":     types.PAUSED,
    "Running":    types.RUNNING,
    "NonLocal":   types.RUNNING,
    "shutdown":   types.HALTING,
    "suspended":  types.PAUSED,
    "Pending":    types.PENDING,
    "Scheduling": types.SCHEDULING,
    "Failed":     types.FAILED,
}

this is a good point of the error condition if not running will be confusing. I can change the condition 1) above to 'Scheduling', and it's a currently defined state, and sort of reflecting the kubernetes app status.

@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 1aed95e to 2e3e102 Compare November 7, 2024 18:50
@github-actions github-actions bot requested a review from eriknordmark November 7, 2024 18:51
@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 2e3e102 to 8a13e80 Compare November 7, 2024 21:24
@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 8a13e80 to b7140d2 Compare December 3, 2024 19:11
@github-actions github-actions bot requested a review from milan-zededa December 3, 2024 19:11
@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from b7140d2 to 2453125 Compare December 10, 2024 05:17
@naiming-zededa
Copy link
Contributor Author

I have rebased and resolved the conflicts. Please review and see if the PR is ok.

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 20.90%. Comparing base (a73c78d) to head (2453125).
Report is 45 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4407      +/-   ##
==========================================
- Coverage   20.93%   20.90%   -0.03%     
==========================================
  Files          13       13              
  Lines        2895     2894       -1     
==========================================
- Hits          606      605       -1     
  Misses       2163     2163              
  Partials      126      126              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -3603,3 +3703,16 @@ func lookupCapabilities(ctx *domainContext) (*types.Capabilities, error) {
}
return &capabilities, nil
}

func getnodeNameAndUUID(ctx *domainContext) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func getnodeNameAndUUID(ctx *domainContext) error {
func (ctx *domainContext) retrieveNodeNameAndUUID(ctx *domainContext) error {

as this is not a getter method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.


// Add pod non-image volume disks
if len(diskStatusList) > 1 {
leng := len(diskStatusList) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g stands for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to 'length'.

@christoph-zededa
Copy link
Contributor

I have rebased and resolved the conflicts. Please review and see if the PR is ok.

I don't see any tests.

@naiming-zededa
Copy link
Contributor Author

I have rebased and resolved the conflicts. Please review and see if the PR is ok.

I don't see any tests.

Added hypervisor/kubevirt_test.go now.

@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from acd9962 to 1129c28 Compare December 11, 2024 06:24
- change VMI to VMI ReplicaSet for kubernetes
- change Pod to Pod RelicaSet for containers
- change functions handling replicaset names in services
- subscribe EdgeNodeInfo in domainmgr, zedmanager to get
  node-name for cluster
- add Designated Node ID to several structs for App
- not to delete domain from kubernetes if not a Designated
  App node
- parse config for EdgeNodeClusterConfig in zedagent
- handle ENClusterAppStatus publication in zedmanger in
  multi-node clustering case
- zedmanager handling effective-activation include ENClusterAppStatus
- kubevirt hypervisor changes to handle VMI/Pod ReplicaSets

Signed-off-by: Naiming Shen <[email protected]>
@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 1129c28 to 7db2529 Compare December 13, 2024 02:54
@@ -3603,3 +3709,16 @@ func lookupCapabilities(ctx *domainContext) (*types.Capabilities, error) {
}
return &capabilities, nil
}

func (ctx *domainContext) retrieveNodeNameAndUUID() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our own nodes nodename, right?
Can't it retrieve that at domainmgr startup?

And why is "UUID" part of the name of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I used to at the startup of domainmgr, the 'waitEdgeNodeInfo' only wait for max of 5 minutes, so i could not relied on this. In previous review of this PR, you pointed out this, and I removed it. So, good point, I should also remove this function here.
I used to have UUID in the function() in previous code before this PR, will remove that.

if err := hyper.Task(status).Cleanup(status.DomainName); err != nil {
log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err)
// in cluster mode, we can not delete the pod due to failing to get app info
if !ctx.hvTypeKube {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I get the feeling that most or all of these hvTypeKube checks should not be in domainmgr but he inside the kubevirt hypervisor package.

If for instance zedmanager is telling domainmgr to shut down or delete a task and if for kubevirt this isn't needed except on some designated node, can't that check whether designated or not be done inside the hypervisor/kubevirt code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move this into the hypervisor functions, and I think mainly we need to pass in if this Domain config is being deleted or not (DomainStatus.DomainConfigDeleted), so, we need to change the API from '.Delete(domainName string)' into '.Delete(status *types.DomainStatus)' (that is going to change for other hypervisors)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you move it there? If so these this change and the preceeding comment can be removed from verifyStatus(), right?

if zcfgCluster == nil {
log.Functionf("parseEdgeNodeClusterConfig: No EdgeNodeClusterConfig, Unpublishing")
pub := ctx.pubEdgeNodeClusterConfig
items := pub.GetAll()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there only a global key for this? If so there is no need to call GetAll - just call Unpublish("global")

Otherwise call GetAll and walk the set of returned objects and unpublish each of their keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

handleENClusterAppStatusImpl(ctx, key, nil)
}

func handleENClusterAppStatusImpl(ctx *zedmanagerContext, key string, status *types.ENClusterAppStatus) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be updated from our POC branch. Or I can do it as part of my PR for failover handling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I suggest amending this commit with latest handleclusterapp.go and applogs.go from POC branch.
In that way they can be reviewed as a unit and my follow up PRs need not include those files.

@@ -154,7 +154,7 @@ func (z *zedkube) checkAppsStatus() {
}

pub := z.pubENClusterAppStatus
stItmes := pub.GetAll()
stItems := pub.GetAll()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this function from the POC branch, it changed a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zedi-pramodh as discussed, since some of the types boolean changes, update the poc code to applog.go would need to update many other files. I'll leave to your later PR to add those changes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok no problem, I will take care of those and submit my PR once this one is merged.

- changed the function to retrieveDeviceNodeName() and call it only at
  the start of domainmgr Run()
- remove the ctx.hvTypeKube and status.IsDNidNode checks in the
  domainmgr.go code; also remove the status.DomainConfigDeleted. we now
  rely on normal domain handling of delete/cleanup work flow
- fixed a bug where nodeName with underscore, which is not allowed in
  kubernetes names
- changed the zedmanager/handleclusterstatus.go code to PoC code base,
  and commented out one line for later PR to handle
- implemented the scheme when kubevirt can not contact kubernetes
  API-server or the cluster does not have the POD/VMI being scheduled
  yet, we return the 'Unknown' status now. It keeps a starting 'unknown'
  timestamp per application
- also if the 'unknown' status lasts longer than 5 minutes, it changes
  into 'Halt' status back to domainmgr
- updated 'zedkube.md' section 'Handle Domain Apps Status in domainmgr'
  for the above behavior

Signed-off-by: Naiming Shen <[email protected]>
@naiming-zededa naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 3e80fde to ee49db4 Compare December 28, 2024 01:58
}
enInfo := NodeInfo.(types.EdgeNodeInfo)
ctx.nodeName = strings.ReplaceAll(strings.ToLower(enInfo.DeviceName), "_", "-")
log.Noticef("retrieveDeviceNodeName: devicename, NodeInfo %v", NodeInfo) // XXX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the XXX comment?

// Get the EdgeNode info, needed for kubevirt clustering
err = domainCtx.retrieveDeviceNodeName()
if err != nil {
log.Fatal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't fatal here since you only wait if hvTypeKube.
How about making Eden/Adam send the EdgeNodeInfo and wait even if not hvTypeKube?
That would remove some special cases like this issue.

@@ -1103,6 +1106,16 @@ func initPublications(zedagentCtx *zedagentContext) {
}
getconfigCtx.pubZedAgentStatus.ClearRestarted()

zedagentCtx.pubEdgeNodeClusterConfig, err = ps.NewPublication(pubsub.PublicationOptions{
AgentName: agentName,
Persistent: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be persistent? That makes it much more painful to add fields to the EdgeNodeClusterConfig in future EVE releases.


isDNiDnode := false
if aiConfig.DesignatedNodeID != uuid.Nil && aiConfig.DesignatedNodeID == ctx.nodeUUID {
isDNiDnode = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be set to true on a single node cluster? For a non-kubevirt case?
It would make sense to add some comment about the expectations in those cases.

st, _ := sub.Get(key)
if st != nil {
clusterStatus := st.(types.ENClusterAppStatus)
if !clusterStatus.ScheduledOnThisNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No publishing seems odd.

I can understand that we want the controller to ignore Info messages (and perhaps also Metrics messages) for app instances which are not scheduled on this node, but it would make more sense to either put that check in the controller (which might require a some new fields in the Info and Metrics protobuf messages) or put the code in zedagent to exclude sending Info and Metrics for such an app instance. But publish here for other agents to see a Status for each Config.

@@ -1563,6 +1617,14 @@ func updateBasedOnProfile(ctx *zedmanagerContext, oldProfile string) {
}

// returns effective Activate status based on Activate from app instance config and current profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment needs to be updated.

}

if ctx.nodeUUID == uuid.Nil {
err := getnodeNameAndUUID(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fetch this when zedmanager starts and not here.

@@ -282,8 +282,8 @@ func waitForNodeReady(client *kubernetes.Clientset, readyCh chan bool, devUUID s
if err != nil {
return err
}
if len(pods.Items) < 6 {
return fmt.Errorf("kubevirt running pods less than 6")
if len(pods.Items) < 4 { // need at least 4 pods to be running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the magic number 4 come from? Please expand on comment.

@@ -1378,3 +1707,33 @@ func (ctx kubevirtContext) VirtualTPMTerminate(domainName string, wp *types.Watc
func (ctx kubevirtContext) VirtualTPMTeardown(domainName string, wp *types.WatchdogParam) error {
return fmt.Errorf("not implemented")
}

func getMyNodeUUID(ctx *kubevirtContext, nodeName string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads more like a "set" or "save" function than a "get" function.


### Handle Domain Apps Status in domainmgr

When the application is launched and managed in KubeVirt mode, the Kubernetes cluster is provisioned for this application, being a VMI (Virtual Machine Instance) replicaSet object or a Pod replicaSet object. It uses a declarative approach to manage the desired state of the applications. The configurations are saved in the Kubernetes database for the Kubernetes controller to use to ensure the objects eventually achieve the correct state if possible. Any particular VMI/Pod state of a domain may not be in working condition at the time when EVE domainmgr checks. In the domainmgr code running in KubeVirt mode, if it can not contact the Kubernetes API server to query about the application, or if the application itself has not be started yet in the cluster, the kubervirt.go will return the 'Unknown' status back. It will keep a 'Unknown' status starting timestamp per application. If the 'Unknown' status lasts longer then 5 minutes, the status functions in kubevirt.go will return 'Halt' status back to domainmgr. The timestamp will be cleared once it can get the application status from the kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code it looks like it will return "Halting" and not "Halt" status.

@@ -1138,6 +1175,11 @@ func maybeRetryBoot(ctx *domainContext, status *types.DomainStatus) {
log.Errorf("Failed to setup vTPM for %s: %s", status.DomainName, err)
}

// pass nodeName to hypervisor call Setup
if status.NodeName == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't NodeName be set when DomainStatus is created in handleCreate?

@@ -1684,6 +1726,11 @@ func doActivate(ctx *domainContext, config types.DomainConfig,
log.Errorf("Failed to setup vTPM for %s: %s", status.DomainName, err)
}

// pass nodeName to hypervisor call Setup
if status.NodeName == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants